Stabilizing Minimum Error Rate Training
نویسندگان
چکیده
The most commonly used method for training feature weights in statistical machine translation (SMT) systems is Och’s minimum error rate training (MERT) procedure. A well-known problemwith Och’s procedure is that it tends to be sensitive to small changes in the system, particularly when the number of features is large. In this paper, we quantify the stability of Och’s procedure by supplying different random seeds to a core component of the procedure (Powell’s algorithm). We show that for systems with many features, there is extensive variation in outcomes, both on the development data and on the test data. We analyze the causes of this variation and propose modifications to the MERT procedure that improve stability while helping performance on test data.
منابع مشابه
Improved performance and generalization of minimum classification error training for continuous speech recognition
Discriminative training of hidden Markov models (HMMs) using segmental minimum classi cation error (MCE) training has been shown to work extremely well for certain speech recognition applications. It is, however, somewhat prone to overspecialization. This study investigates various techniques which improve performance and generalization of the MCE algorithm. Improvements of up to 7% in relative...
متن کاملMinimum rank error training for language modeling
Discriminative training techniques have been successfully developed for many pattern recognition applications. In speech recognition, discriminative training aims to minimize the metric of word error rate. However, in an information retrieval system, the best performance should be achieved by maximizing the average precision. In this paper, we construct the discriminative n-gram language model ...
متن کاملAdaptive multiuser receivers for DS-CDMA using minimum BER gradient-Newton algorithms
In this paper we investigate the use of adaptive minimum bit error rate (MBER) Gradient-Newton algorithms in the design of linear multiuser receivers (MUD) for DS-CDMA systems. The proposed algorithms approximate the bit error rate (BER) from training data using linear multiuser detection structures. A comparative analysis of linear MUDs, employing minimum mean squared error (MMSE), previously ...
متن کاملMinimum divergence based discriminative training
We propose to use Minimum Divergence(MD) as a new measure of errors in discriminative training. To focus on improving discrimination between any two given acoustic models, we refine the error definition in terms of Kullback-Leibler Divergence (KLD) between them. The new measure can be regarded as a modified version of Minimum Phone Error (MPE) but with a higher resolution than just a symbol mat...
متن کاملMinimum classification error training of hidden Markov models for acoustic language identification
The goal of acoustic Language Identification (LID) is to identify the language of spoken utterances. The described system is based on parallel Hidden Markov Model (HMM) phoneme recognizers. The standard approach for parameter learning of Hidden Markov Model parameters is Maximum Likelihood (ML) estimation which is not directly related to the classification error rate. Based on the Minimum Class...
متن کامل